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Abstract 

We introduce novel blind and semi-blind channel estimation methods for cellular time-division 
duplexing systems with a large number of antennas at each base station. The methods are based on 
the maximum a-posteriori principle given a prior for the distribution of the channel vectors and the 
received signals from the uplink training and data phases. Contrary to the state-of-the-art massive MIMO 
channel estimators which either perform linear estimation based on the pilot symbols or rely on a blind 
principle, the proposed semi-blind method efficiently suppresses most of the interference caused by 
pilot-contamination. The simulative analysis illustrates that the semi-blind estimator outperforms state- 
of-the-art linear and non-linear approaches to the massive MIMO channel estimation problem. 


I. Introduction 

Deployment of large numbers of antennas at the base stations in a cellular network can lead to huge 
gains in both, spectral and energy efficiency. The increase in energy efficiency is mostly due to the array 
gain and the high spectral efficiency is achieved by serving several users simultaneously through spatial 
multiplexing. 

Under favorable propagation conditions [2], namely, i.i.d. channel coefficients for the links from the 
different antennas at a base station to a user terminal, the spatial channels to distinct users are close 
to orthogonal in such a massive MIMO system due to the law of large numbers. Consequently, the 
interference caused by spatial multiplexing can be suppressed with simple signal processing methods, 
e.g., matched filters [2]-[4] or even constant envelope precoding [5]. 

This work extends our previous considerations on semi-blind channel estimation in [1]. 
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Even these simple beamforming techniques require accurate channel state information (CSI) at the base 
station to enable the interference suppression capability. A closed-loop setup is necessary for channel 
estimation in frequency division duplex (FDD) systems, where the downlink channels must be estimated 
in a downlink training phase and fed back to the base station to allow downlink beamforming. Note that 
the worst-case number of orthogonal pilots necessary to enable the estimation of the vector channels by 
the single-antenna receivers is equal to the number of base station antennas. Given a block fading model, 
where the channels are constant in a coherence interval with fixed number of channel accesses, it is thus 
challenging to implement an accurate closed-loop channel estimation for a massive MIMO system, due 
to the large number of base station antennas [2], [3]. Additionally, the fed back CSI is deteriorated not 
only by the estimation errors, but might also suffer from the quantization error due to the limited rate 
feedback and outdating due to the delay between estimation and application at the base stations. 

Instead, we focus on time-division duplexing (TDD) massive MIMO systems and exploit the channel 
reciprocity to acquire CSI in a timely manner. That is, in each coherence interval, we have an uplink 
training phase where each user transmits a pilot sequence. Thus, in a TDD system, the necessary number 
of orthogonal training sequences is equal to the number of users. In other words, the number of orthogonal 
pilot sequences is dramatically smaller for TDD than for FDD massive MIMO systems, since the number 
of users is much smaller than the number of antennas at the base station. 

For the downlink phase in TDD systems, the base station generates beamforming vectors based on 
the channel estimates obtained during the previous uplink phase. Note that such a usage of the uplink 
CSI for downlink beamforming is impossible for FDD systems contrary to TDD systems. However, a 
sophisticated calibration of the uplink and downlink signal chains is necessary to enable the exploitation 
of the channel reciprocity in TDD systems. In the following, we will assume perfect calibration implying 
that reciprocity between uplink and downlink channels holds. 

As the total number of orthogonal pilot sequences is limited by the channel coherence interval and the 
requirement to actually transmit payload data, the pilot sequences must be reused in neighboring cells. 
Therefore, the channel estimates contain interference from the channels to users in neighboring cells with 
the same pilot sequences since only the pilot sequences within one cell are guaranteed to be orthogonal. 
The interference between the channels of the users during the uplink training leads to interference in 
both the uplink and downlink data transmission phases in a reciprocity based system. This effect is called 
pilot-contamination, which, due to the resulting interference during the data transmission, poses a limit 
on the achievable rate in a massive MIMO system that relies on linear signal processing [3], [4], [6]. 

Several approaches were proposed to tackle the pilot-contamination problem. With coordination be- 
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tween base stations, the impact of pilot-contamination can be reduced, e.g., by appropriately scheduling the 
downlink and training phases for neighboring cells [7] or by coordinating the allocation of pilot-sequences 
in neighbouring cells [8]-[10]. Optimization of the trade-off between channel accesses employed for 
pilot transmission and for data transmission further improves the total throughput and fairness of the 
system [11], [12]. 

Cooperative transmission from several base stations to a single user, so called coordinated multi-point 
(CoMP) or network MIMO, was proposed for multi-cell MIMO systems to enable inter-cell interference 
management [13]. To this end, instantaneous CSI is necessary at the central hub. In massive MIMO 
systems, however, cooperative transmission strategies, so called pilot contamination precoding (PCP), 
can be designed based on channel statistic [14]-[17]. In the asymptotic limit, when the number of base 
station antennas tends to infinity, it is possible to apply zero-forcing techniques to remove all interference 
using only statistical information about the channel [14], [15]. 

Independently of improved resource allocation or PCP, the interference and noise in the channel 
estimates can alternatively be reduced by more sophisticated channel estimation methods. The structure 
in the channel covariance matrices can be exploited by linear minimum mean squared error (MMSE) 
estimation [9] or by heuristics based on projecting onto principal and minor subspaces of the channel 
covariance matrices [18], [19]. For sparse channels, other approaches involving results from compressed 
sensing theory have been proposed [20], [21]. Approaches that exploit the structure of the channel 
might reduce the necessary amount of pilot sequences in FDD massive MIMO systems [21]-[23]. As 
pilot-contamination is also a sever problem in the uplink, similar techniques have been proposed for 
equalization [24]. 

Previous work on the suppression of pilot-contamination during the channel estimation by the appli¬ 
cation of non-linear signal processing has been reported in [25]. The blind channel estimation approach 
proposed in [25] is based on the idealistic assumption, that interfering users from neighboring cells always 
have weaker channels than the desired users. Under this assumption, it is possible to separate the desired 
uplink signal space and the interference space in the asymptotic limit of an infinite number of users 
and antennas, for a fixed ratio of number of users and antennas, just by applying a principal component 
analysis to the received uplink data. Another related blind channel estimation method was proposed 
in [26], where the principal component analysis of the uplink data together with prior information on 
the quality of the channels to the different users is used to estimate the channel vectors up to a scalar 
ambiguity. Alternatively or additionally to the principal component analysis of the uplink data, also 
information from the decoder can be used to improve the channel estimation [26], [27]. 
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A semi-blind channel estimation approach for the interference-free case was proposed in [28]. The 
approach is based on the assumption that the blind estimate of the channel is unique up to a unitary 
rotation. The unitary ambiguity is then estimated with the help of pilot signals. In [1], we proposed 
a heuristic semi-blind method based on the maximum-a-posteriori (MAP) problem formulation, which, 
however, is not sufficiently robust in the case of inaccurate subspace information from the principal 
component analysis of the uplink data. 

We take a methodical approach to the uplink channel estimation problem in massive MIMO systems 
with the following contributions. 

• We review results on training-based estimation in Section III and give the analytical solution for the 
maximum-a-posteriori (MAP) estimate based on the received training data, which, for the considered 
Gaussian system model, is equivalent to the training-based minimum mean squared error (MMSE) 
estimate. 

• In Section IV, we derive an analytical solution of the MAP estimator for the blind case, i.e., the 
channel estimation only relies on the received data but no training data is used. The resulting estimate 
is based on the singular value decomposition of the received uplink data and is similar in structure 
to the one proposed in [26]. 

• In Section V, we formulate the MAP estimator for the combined data and training based estimation, 
i.e., semi-blind channel estimation. However, the resulting optimization problem turns out to be 
non-convex. Therefore, a gradient ascent method is briefly described to obtain a local optimum. 

• We discuss heuristic suboptimal methods for semi-blind estimation which can be used as a starting 
point for a gradient based search on the original MAP problem. 

• The numerical results, presented in Section VI to compare the different estimation approaches, 
demonstrate the performance gains using the proposed semi-blind estimation methods. 

A. Notation 

Throughout this paper, (•)^ denotes the conjugate transpose of a matrix and (•)"'■ denotes the pseudo 
inverse. The notation f{x) oc g{x) is short for g{x) = cf{x) where c is a constant which is independent 
of x. With 0 and 1 we denote vectors of all-zeros and all-ones and I denotes the identity matrix. The 
operation [•]+ is short for max(-,0). 
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II. System Model 

We consider a cellular network with L cells. Without loss of generality, we focus on the channel 
estimation at the base station in cell 1. Let 

= ( 1 ) 

denote the matrix of channel vectors corresponding to the users k = 1,... ,K in cell i to base station 1, 

where M is the number of antennas at the base station in cell 1 and K is the number of users per cell, 

which, for the sake of notational brevity, we assume is the same for all cells. 

The channel vector from user k in cell i to base station 1 is modeled as [3] 

^ik y /Pik (2) 

where the entries of are i.i.d. complex Gaussian with zero mean and unit variance, and the scaling 
factor j3ik describes the quasi-static shadow fading and the path-loss. Channel vectors to different users 
are assumed to be independent. Consequently, we have 

H, = (3) 

with Ai = [an,a^x] and Bi = diag(/3a, ...,l3iK)- 
We consider a block fading model, i.e., the channel is constant in a certain coherence interval T and 
is independent of the channel in the next coherence interval. Of this coherence interval, T^x time slots 
are used for uplink data transmission, Ttr > K time slots for the transmission of uplink pilot sequences, 
and the remaining time slots are used for downlink data transmission. We assume that all terminals use 
the same transmit power. The received uplink data is given by 

Y = ^xY. H,Xf + Nn G (4) 

i=l 

where denotes the uplink signal-to-noise-ratio (SNR), i.e., the uplink transmit power divided by the 
noise power at one receive antenna, Xi G is the uplink data transmitted by the users in cell i, 

and Nn is the additive noise. The entries in and Xi are assumed to be i.i.d. complex Gaussian with 
zero mean and unit variance. 

For notational convenience, we collect all variables of the interfering cells in iT = [H 2 , ..., Hi] and 
X = [X 2 ,... ,Xl\ leading to 

Y = ^xHiXf + ^xHX^ + NneC^^^-\ (5) 
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Similarly, the received training data can be written as 

^ + N,, G (6) 

where the columns of are the, not necessarily orthogonal, unit-norm training sequences used 

by the users in cell 1 and the columns of = \^ 2 , ■ ■ ■ ,^l\ £ c^trxlL-ilif training sequences 

used in the interfering cells. It is necessary to set /9tr = Pu\T\r for a resulting average uplink SNR of p^x 
since the training sequences are unit-norm. 


III. Training Based Estimation 

In this section, we review some well established results on training based estimation that relate to the 
novel approaches presented in the following sections. We will also use these training based estimates as 
a baseline in the evaluation of the different estimation approaches in Section VI. 

One straightforward way to find a channel estimate based on the training data without employing any 
prior information is the least squares (LS) estimate 

1 2 

= argmin -^ (7) 

Hi F 

where H-Hp denotes the Frobenius norm. Note that the optimizer of (7) is the maximum likelihood (ML) 
estimate for the interference-free case due to the assumption that the noise Ntr is complex Gaussian with 
i.i.d. zero-mean, unit-variance entries. The solution to (7) is simply given by 

+ , ( 8 ) 

Consequently, for orthonormal pilot sequences, where we simply correlate the received 

signals at all antennas with each of the pilot sequences. Note that the LS estimate does not give satisfactory 
results if the interference from the other cells during the pilot phase cannot be neglected. 

We also formulate the maximum likelihood (ML) problem for the joint estimation of both, the desired 
and the interfering channels, as 



(9) 

( 10 ) 
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Therefore, the log-likelihood function reads as 


UHi,H) = - 




(11) 


Setting the derivative w.r.t. and H* to zero, we get the normal equations 


[- n 



p n 


p 1 

Hi, H 




= - 

»Z^1, ^ 








( 12 ) 


Since G the normal equations (12) do not have a unique solution for Hi and H if the 

number of pilot symbols Ttr is smaller than the total number of users LK in the network. Note that if 
we set = 0 in (9), i.e., if we neglect the channels to the interfering users, we end up with the least 
squares problem (7). 

If the priors are known, we can formulate the training based maximum a-posteriori (MAP) estimator 

(^TR^^TR) ^ argmax4|jj^ ^)/h, (T/i)/^(iT) 

{HuH) 


with 


= aig-niayiUr{Hi,H) + l^^{Hi,H) 
{HuH) 




(13) 


(14) 


due to (3) where B = blkdiag(S 2 ,..., B^), and lxr{Hi,H) can be found in (11). 

The linear system of equations for the optimizer reads as 

r 1 l^^'^i + BT^ 

Hi, H ^ , , , 

L J W^Wi + B-^ 

which is obtained by setting the derivatives w.r.t. iT* and H* to zero and has a unique solution due to 
the regularization by B^^ and B~^. We explicitly solve (15) for Hi leading to 

Hi{Wf{l-W{'^^^ + B~^)-^^^)Wi + B~^) = + B-^)-^^^)^i. 

\/^ 

Application of the matrix inversion lemma gives 


\f(hx 




(15) 


Hi{Wf{l+^BW^)-^^i + B'^^) = -^(I+#.B#^)"i»Pi. 

\/^ 

Substituting the definitions of and B yields 

Hj^ = + ■ iPi X (!pH(I + ■ . (16) 

V i=2 i=2 
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For our model, the MAP estimate is also the minimum mean squared error (MMSE) estimate, because 
is jointly Gaussian in Hi,H, and 

If we use the same orthonormal training sequences tPj = in all cells i, (16) simplifies to 

^TR ^ 1 

In other words, the MMSE estimate is simply a scaled LS estimate [cf. (8)] in this case and consequently, 
the performance for linear matched filter or zero-forcing precoding based on these estimation methods is 
equivalent. Note that this statement is not true for the more general case of correlated channel coefficients, 
i.e., hik ~ J\fc{0, Rik) where Rik is not a scaled identity matrix. In this case, the MMSE estimation 
can significantly outperform the LS estimation, depending on the structure of the channel covariance 
matrices. 

In practical systems, the number of pilot symbols is significantly smaller than the total number of 
users in the network. Thus, the training-based estimates are subject to pilot contamination because of 
the necessary reuse of pilot sequences. With Tj,. = K and the same set of orthonormal pilot sequences 
reused in each cell, the least squares estimate is given by 

EfP = EEi + V Ef, + . (18) 

^ PtT 

j=2 

The effect of the interference during the uplink channel estimation is clearly visible. Furthermore, note 
that ~ Altr due to the assumption of orthonormal pilots. 

IV. Blind Estimation 

Previous work on blind channel and subspace estimation for massive MIMO networks is based on 
the asymptotic orthogonality of both, channel vectors and data symbol sequences for a large number of 
antennas M and a large number of received uplink data signals T^x [25], [26]. 

We refine the algorithm of [26] for the multi-cell case, under the assumption that the slow fading 
coefficients jSik to all users can be learned at the base stations and are therefore known [see e.g., [14]]. 

For blind channel estimation, the notational differentiation between desired and interfering channels is 
not necessary. Hence, we collect all channels in a single matrix 

H =[Hi,...,Hl] = AB (19) 

with the corresponding matrix of slow fading coefficients B = diag(Bi,..., Bl) and the matrix of fast 
fading coefficients A = [Ai,..., A^] with i.i.d. complex Gaussian, unit-variance, zero-mean entries. In 
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this section, we will derive the analytical result for the MAP estimator in the blind case, that is, 

= avgma^fY\H{Y\H)fH{H). ( 20 ) 

H 

Note that the formulation in (20) is only based on the received data Y [see (5)] but is independent of 
the received pilot signal ^ [see (6)]. Thus, a quasi-closed-form solution can be obtained. 

Theorem 1, For M > KL, the blind MAP channel estimate is unique up to an unknown complex phase 
for each channel vector, and the SVD of one possible optimizer 

= WST (21) 

can be calculated from the SVD of the uplink data Y = U as follows. 

The matrix of left singular vectors W = Ui:KL A a matrix with columns 1 to KL of U. 

The matrix of right singular vectors T = J7 G {0, is a permutation matrix, such that the 

entries along the diagonal of IT are sorted ascendingly. 

The singular values in the diagonal S are given by 



Proof: The proof can be found in Appendix A ■ 

The singular values are not necessarily relevant for practical purposes, i.e., linear precoding and filtering. 
The main observation is here, that we can estimate the channels up to a scalar ambiguity by applying 
an SVD to the uplink data Y and correctly assigning the left singular vectors to the different users. The 
assignment of the channel vectors to users is possible if the slow fading coefficients in B are known. 

Note that the estimated channel vectors are already orthogonal since they are left singular vectors of 
Y. Consequently, the matched filter and the zero-forcing filter based on this estimate are equivalent. 

A major cause of estimation errors of the blind estimator are users with similar slow fading coefficients. 
Even for perfectly orthogonal channel vectors, which is fulfilled for M —)■ oo, the corresponding 
subspaces can not be separated with the SVD of the uplink data and the estimated channel vectors are a 
linear combination of the actual channel vectors of those users with similar slow fading coefficients. To 
accurately separate channels to different users with blind estimation we need not only a large number of 
antennas, but also a large number of data samples within one coherence interval. The first condition is 
met in a massive MIMO system, but the length of the coherence interval is an inherent property of the 
wireless channel and thus a sufficient number of data samples cannot be guaranteed. 
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V. Semi-blind Estimation 


If only a small number of uplink data samples is available, the blind estimation approach discussed in 
the previous section can deliver poor results, because the subspace estimate might be inaccurate. In this 
section, we therefore consider the joint semi-blind estimation based on both, uplink data and training 
signals. 

We first discuss the MAP problem for this setup, which is non-convex in the semi-blind case. Then, 
we introduce a heuristic estimation method based on subspace projection that can be used to initialize 
an unconstrained solver for the MAP problem and exhibits low-complexity. 

We want to calculate an estimate of the channel Hi given the observations Y of the data and ^ of the 
pilot signals. However, difficult marginalization steps can be avoided by jointly estimating all channels. 
With H = [Hi ,..., Hl] [see (19)], the semi-blind MAP estimate for all channels to base station 1 is 
given by 


H 

= s.Tgma^fYiH{Y\H)U^H{^\H)fH{H). 

H 

= argmaxftr(Jf) + lpi{H) + lui{H) (23) 

H 

since the received data signal Y and the received pilots ^ are independent when conditioned on the 
total channel matrix H. All of the probability density functions in (23) are circularly symmetric complex 
Gaussian and follow directly from the system model in (3), (5), and (6). The corresponding log-likelihood 
function for the training signal is given by [cf. (11)] 


UH) = -\\^-^H^^\\l. (24) 

with ^ = [^i,... ,^l]. The log-likelihood functions for the prior and the uplink signals have been 
derived in the previous sections, see (37), and (36). 

Due to the non-convex nature of the objective function in (23), finding fhe global optimizer is difficulf 
in general. However, since fhe objecfive is differenliable, we can use any gradienf based mefhod fo find 
a local optimum from an initial guess. 

For fhe derivafives, we have 

= (I -GH^)YY^G - TuiG (25) 


for fhe dafa pari, wilh 


G = H[ —1+H^H 

,Pu\ 


-1 


(26) 
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For the prior part, we get 

= HB-^ = ... HlBI^] (27) 

and for the training part, 

(28) 

A suitable method for finding a local optimum for the relatively large scale semi-hlind MAP problem is, 
e.g., the L-BFGS [29] algorithm, which is a limited-memory quasi newton method. 

An accurate initial guess is essential to obtain a good performance. To this end, we propose a heuristic 
estimation which method we term pilot-aware subspace projection (PASP). The basic idea of this heuristic 
is to take the SVD of the uplink data Y = USV^ and for each user {i,k) select a subset of the left 
singular vectors in U based on the slow-fading coefficients in B and the allocation of pilot sequences. 
The least squares estimate of the user is then projected onto the subspace spanned by the selected 
singular vectors, i.e, 

frff P = UAU^hk! (29) 

where A is a diagonal matrix with zeros and ones on the diagonal, selecting the desired singular vectors. 

A detailed description of the method and the rationale behind it can be found in Appendix B and 
results for the convergence speed in simulations with different initializations are presented and discussed 
in Section VI. 

Note that similarly to the blind MAP approach also for semi-blind channel estimation the performance 
increases with an increasing number of antennas and uplink data signals and thus the semi-blind approach 
is especially suitable for the considered large-scale massive MIMO systems. However, in contrast to the 
blind method, the semi-blind estimation offers a strict improvement over the training based least-squares 
estimation and thus does not fall off for a smaller number of available uplink signals. 


A. Upper Bound 

By combining the training signals with the uplink data signals in the estimation process, we basically 
extend the training phase by the data phase for which we only have statistical information. Therefore, in the 
hypothetical best case, a semi-blind estimation method has the same performance as an optimal estimation 
with exact knowledge of all transmitted data symbols. That is, if a genie provides the transmitted data 
symbols, the augmented pilot sequences in cell i are given by 




ub 




(30) 
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Angle between subspaces 9 in Degrees 


Fig. 1. CDF of the angle between estimated and actual subspace of the channel vectors for different estimation methods in a 
network of L = 21 cells with K — 4 users per cell, M = 200 base station antennas, and T^i = 200 uplink data samples. 


We calculate the upper bound assuming known data symbols based on the MAP/MMSE estimate discussed 
in Section III. 


VI. Results 

We compare the different approaches to channel estimation by simulation in a cellular system with 
L = 21 cells in a wrap-around configuration, where each base station is equiped with M = 200 antennas 
and AT = 4 users are served in each cell. The pathloss of the users is calculated according to the urban 
macro model in the ITU guidelines [30] with log-normal shadow fading with a standard deviation of 
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6dB. 


To evaluate the performance of the different algorithms we present results for the angle between the 
estimated and the actual channel suhspace, i.e., 


9{h, h) = cos ^ 



(31) 


and since we are evaluating a communication system, we also present results for the achievable downlink 
rates, when using either a linear matched filter or a linear zero-forcing precoder based on the channel 
estimates. For the downlink rates we assume perfect knowledge of the equivalent channel at the user 
terminals. 

In Fig. 1, the experimental cummulative distribution functions (CDF) of the subspace estimation error 
are depicted and in Fig. 2, the corresponding downlink rates are shown. We give the results for training- 
based least-squares estimation, blind estimation, and semi-blind MAP estimation as well as the CDFs of 
the genie-aided upper bound discussed in Section V-A. 

Regarding the accuracy of the channel estimation measured by the angle 6 [see (31)], the results in Fig. 1 
show an improvement of the semi-blind method compared to the LS channel estimation and the genie- 
aided channel estimation gives a performance upper bound. However, the blind approach outperforms the 
LS estimation in terms of accuracy of the estimated subspace but delivers mixed results for the achievable 
rates with a slight improvement for the worst 20% of the users. The semi-blind MAP approach of 
Section V with the with pilot-aware subspace projection (61) as initialization always yields a significant 
performance gain. In the considered scenario, the average rate is increased by about 25% when we 
apply semi-blind channel estimation and especially the weak users benefit from the improved channel 
estimation with gains at the 5th percentile of about 800%. Also note that zero-forcing precoding only 
benefits the users with above average channel quality and for matched filter precoding, the blind estimation 
outperforms the other methods for strong users due to the inherent orthogonality of the channel estimates. 

In Fig. 3, we present the average rate performance of the different algorithms versus the length of the 
received uplink data signals T^i. For the cell-edge users, the gain of the semi-blind MAP method is already 
larger than 400% for 50 uplink samples, growing to a tenfold increase in performance for 400 uplink 
samples. The performance of blind estimation increases only slowly with the number of uplink time slots. 
In other words, in contrast to the blind MAP estimation, the proposed semi-blind MAP estimation always 
outperforms the training based least-squares estimate and offers significant performance gains even for a 
small number of received uplink data signals. Note that for the blind and the semi-blind estimation it is 
beneficial to have a large number of antennas due to the increasing structure in the uplink data. 
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The unconstrained optimization for the MAP approach is performed with the limited-memory BFGS 
(L-BFGS) algorithm [29]. In Fig. 4, the average user-rate performance of the MAP estimator is given 
for different initializations versus the number of iterations of the L-BFGS algorithm. Note that with all 
initializations, the iteration seems to converge to the same achievable rate. However, the convergence is 
rather slow which is due to the large dimensionality of the problem. 

If we use the least squares estimate as initial guess, the initial improvement is small due to the fact that 
the estimates for the different cells are linearly dependent or even identical when the same pilot sequences 
are reused in each cell. Using the blind estimate as a starting point yields improved convergence speed. 
The proposed semi-blind PASP approach, however, significantly outperforms both the least-squares and 
the blind estimates. Random initialization of the iteration does not result in satisfactory performance for 
a reasonable amount of iterations and should be avoided. 

VII. Conclusion 

We discussed the MAP channel estimation based on different observations of a base station in a 
cellular network. For training-based estimation the results are well-known and involve solving a linear 
system of equations. We were able to derive an quasi-analytical solution to the blind MAP problem 
based on the SVD of the uplink data. However, system level simulations indicate that blind estimation 
delivers unsatisfactory results in a typical massive MIMO system. With the proposed semi-blind MAP 
estimator on the other hand, we are able to improve upon the estimation accuracy of a solely training 
based estimation. Our results indicate that the proposed semi-blind estimation approaches consistently 
outperform state-of-the-art training based channel estimation methods significantly. 

Appendix 

A. Proof for the Blind MAP Estimator 

For the following derivation, we assume that M > KL, i.e., the number of antennas at the considered 
base station is larger than the total number of users. Additionally, we need the probability density function 
of the uplink data given the channel coefficients. Since we do not distinguish between desired and 
interfering channels, we can write (5) more concisely as 

F = ^xHX^ + (32) 

where X = [Xi,..., X^]. Since all entries of both X and N^x are i.i.d. Gaussian with zero mean and 
unit variance, the columns yt,t = 1,..., T^u of the uplink data Y, given the channel matrix H, are also 
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i.i.d. Gaussian with zero mean. The covariance matrix of the t-th column calculates to 

.Hi _ „ -rir u™ rj-Hi I T _ „ ur uH 


nytVt] = Pul E[HxtX^H^] + I = PuiHH^ + I 


(33) 


where xf is the t-th row of X. The joint density for the uplink data Y given the channel realizations 
H thus reads as 


fY\H{Y\H) 


exp 


oc 


— tr 


+ 1) ^Y 


det^"' {pniH^H + I) 


exp 


oc 


tr 


YY^H (H^H + Yi] H 


Pul 


-1 


rH 


det^"' {pniH^H + I) 
where we used the matrix inversion lemma to obtain the second line. 

Due to the strict monotonicity of the logarithm, the MAP formulation (20) can he rewritten as 

= argmax(ui(iT) + lm{H) 


(34) 


H 


where 

luxiH) = 
and [cf. (14)] 


tr 


( H^H + —1 

Pul 


rHi 


-1 


H 


H 


-Tuilogdet {p,iH^H + l) 


1 uHi 


lpr{H) = 


(35) 


(36) 


(37) 


due to the channel model in (19). 

In the following, we analyze the likelihood functions more closely. Let 

H = WST^ G (^MxKL 


(38) 

KLxKL 


denote the reduced singular value decomposition of H where S G 'g^KixKL diagonal, T G C 
is unitary, and W G is suhunitary due to the assumption that M > KL. Substituting the SVD 

of H into Zui yields 


U{W,S,T)=tr 


1 


YY^WS { + — 1] SW^ 

Pul 


-1 


- log det {puiSS + I) 


(39) 


We observe that l^i does not depend on the right singular vectors T of H, whereas 

lpr{W,E,T) = -tr:[WST^B-^TSW^] 

= -tr[SST^B~^T] 

is independent of W. 


(40) 
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For given singular values E and with 


C = YY^ 


1 


we can simplify /ui as 


D = — 

Pul 


1^\{W) = tr [W^CWD 


where the second term, which is independent of W, has been dropped. 

As lpv{H) is independent of W, the optimal W is the solution to 

= argmaxtr [W^CWD] s.t. W^W = I. 

w 

The corresponding Lagrangian is given hy 

L{W, A) = ti{W^CWD) + ti{A{W^W - I)) 

where A is the Hermitian Lagrangian multiplier. Derivation with respect to W* leads to 

dL{W,A) 


dW* 


= CWD + WA = 0 


from which follows that 


W^CWD+A=0 


since W^W = I. Consequently, 


W^CWD = DW^CW 


(41) 

(42) 

(43) 

(44) 

(45) 

(46) 

(47) 

(48) 


since A has to he Hermitian as the constraint in (44) is Hermitian. It can he inferred, that the optimal left 
singular vectors W of H diagonalize the sample covariance C, i.e., with the eigenvalue decomposition 
C = U1!^U^ wo have 

W = un' (49) 

where U is the matrix of left singular vectors of Y and II' G {0, is the left block of a M x M 

permutation matrix. Note that there is an ambiguity in every eigenvector of C, i.e., the corresponding 
column of U, with respect to a scalar multiplication with absolute value one, which we can move to the 
right singular vectors T of iT so we do not have to consider it here. 

Substituting (49) into (44) yields 

i7“P‘= argmaxtr [(iT')'^2:2i7'i9] (50) 

17' 
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where U' G {0, Since the diagonal entries in D are ordered decreasingly, as the entries of S 

are in descending order [cf (42)], the optimal choice for the selection matrix II' is 


nopi 


I 

0 


(51) 


because we want to match the largest eigenvalues in with the largest values in D to maximize the 
trace. Thus, the optimal choice for the left singular vectors W of H, when conducting blind estimation, 
are simply the principal left singular vectors Ui^kl of Y. 

A similar analysis can applied to lpr{H) [see (40)], which is independent of W, to get the optimal 
right singular vectors T. For given S, the optimal right singular vectors are given by 


T = (52) 

where 77 G {0, ^ permutation matrix, such that the entries along the diagonal of II^B~^II 

are sorted ascendingly and ^ contains the unknown phase shifts mentioned above. Based on the permu¬ 
tation matrix 77 , we can define a function 7r(i, k) such that the 7r(i, /c)th left singular vector of Y 

spans the subspace of the blind estimate of user k in cell i. The inverse mapping (i,/c) = 7r“^(n) 
delivers the cell and user indices corresponding to the nth singular vector of the uplink data. 

With the results for the left and right singular vectors, we can now optimize the singular values S. 
Substituting the previous results, the optimization problem reads as 

-Tui logdet (puiS^ + I) 


S°^'^ = argmax tr 

s^o 


= argmax/(S^) 


(53) 


s>-o 


where Bp = diag(cJi,... ,(Tkl) denotes the matrix with the principal KL singular values of Y along 
the diagonal. 

Setting the derivatives of the objective function 

dl{S‘^) _ crl/p^x Tui 1 


to zero yields 


da 


pul 


(^n + l/Pul)^ Cn + l/Pul Pn-^in) 


((n + ) (Cn + p„,) ^ 


+ /57r-i(n)^ul) C. 


2 I 1 _ 


+ 


pul 




(54) 

(55) 

(56) 
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Since > 0, the optimal are thus given hy 


= 

S>n 


1 

Pul 


+ \ --h 


Pul 


(57) 


B. Pilot-Aware Subspace Projection 

From Section IV, we know that the blind estimation of the channel is unique up to a scalar amhiguity. 
The direction information resulting from the blind estimation of the channel to user k in cell i is given 
by the Tr(i, A:)th left singular vector fc) of the uplink data Y [see the discussion below (52)]. 

By projecting the training based least-squares estimate onto the one-dimensional space spanned by the 
blind estimate, we get a unique estimate of the channel vector 

(58) 

If the direction of the blind estimate is accurate, the projection cancels out most of the interference in 
the least-squares estimate. However, the projection in (58) does not improve the estimation of the spatial 
direction of the channel vector w.r.t. the blind estimate. Indeed, it only removes the scalar ambiguity of 
the blind estimate, which is, however, of small importance or even irrelevant in most cases. 

Alternatively, we can collect R > K left singular vectors of Y corresponding to the R largest singular 
values in the matrix Ui-r and project the least-squares estimate onto the range of Ui:ji, i.e., 

= (59) 

Note that the number of basis vectors R must be chosen large enough such that all significant parts of the 
desired channel vectors lie in the resulting subspace. With this heuristic strategy, noise and weak interferers 
can be suppressed. However, the channel estimate is still subject to pilot-contamination originating from 
strong interfering users. 

We can generalize on the semi-blind approaches in (58) and (59) by incorporating a diagonal weighting 
matrix into the projection resulting in 

= Ui-.R dmg{Xik)U^Rh\i (60) 

where the weights are chosen heuristically and depend on both, the slow fading coefficients and the 
pilot sequences of all relevant users. 

If we choose Ajfc = 1, we get the full projection on the i?-dimensional subspace in (59) which is 
similar to the subspace projection proposed in [25], but extended to the practically relevant case of 
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Strong interferers. For \ik = we get the projection in (58), which performs equivalent to the 

blind estimate. 

Our goal is to find a heuristic, which comhines the suhspace information from the blind estimation 
with the LS estimate obtained based on the training data in a way such that the interference in the LS 
estimate is reduced. Suppose all base stations use a fixed sef of orfhonormal fraining sequences V wifh 
\V\ = Ttr, i.e., we have tjjik £ V for each fraining sequence in all cells. Therefore, fhe channel esfimafe 
of a desired user suffers from confaminafion originating from users in neighboring cells fhaf employ fhe 
same fraining sequence. 

By fhe choice of \ik, the LS estimate is projected onto the space spanned by several of the singular 
vectors, that is, onto span({w„}^^^) with a < 7r(z, k) < b, to reduce the impact of inaccuracies in 
the singular vectors due to users with similar channel gains, while still reducing significant parts of the 
interference. 

Let us denote with I < 7r(f, k) the index of the closest (in terms of channel quality) interfering user with 
better channel quality and u > 7r(i, k) the index of the closest interfering user with lower channel quality. 
To achieve the goal of interference suppression, none of the singular vectors n = a, a+1,... , 6 should 
correspond to an interfering user, e.g., with the following heuristic choice for the window coefficients 


^ik', 


1 y /{VjPik — — y/ I^TT~^{u)l^ik 


(61) 


1^0 else. 

In other words, we project onto the basis of the left singular vectors of Y that are “closer” to that 
corresponding to the desired user than to any interfering user, where we use the geometric mean to 
define the point of equivalent distance. 

With this approach, we project the training based estimate onto a higher dimensional subspace which 
leads to an improved estimate of the channel direction, while still suppressing major parts of the pilot- 
contamination. 

Note that it is not possible to estimate the CSI accurately by this heuristic method, if an interfering 
user, which employs the same pilot sequence, has a slow fading coefficient that is very similar to that 
of the desired user. This indicates, that the estimation performance can be improved by assigning pilots 
accordingly. 
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Fig. 3. Achievable rate for different lengths of the uplink data transmission interval Tui, on the left with zero-forcing and on 
the right with matched filter. The upper plots present the average rate and the lower ones the rate at the 5th percentile, i.e., the 
rate of the cell-edge users. 
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Fig. 4. Performance for different initializations of the semi-blind MAP estimator with respect to the number of iterations used 
to find an optimizer for the MAP problem 
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